43 research outputs found

    Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

    Full text link
    Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

    Effective and Efficient Similarity Index for Link Prediction of Complex Networks

    Get PDF
    Predictions of missing links of incomplete networks like protein-protein interaction networks or very likely but not yet existent links in evolutionary networks like friendship networks in web society can be considered as a guideline for further experiments or valuable information for web users. In this paper, we introduce a local path index to estimate the likelihood of the existence of a link between two nodes. We propose a network model with controllable density and noise strength in generating links, as well as collect data of six real networks. Extensive numerical simulations on both modeled networks and real networks demonstrated the high effectiveness and efficiency of the local path index compared with two well-known and widely used indices, the common neighbors and the Katz index. Indeed, the local path index provides competitively accurate predictions as the Katz index while requires much less CPU time and memory space, which is therefore a strong candidate for potential practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table

    Just-for-Me: An Adaptive Personalization System for Location-Aware Social Music Recommendation

    Get PDF
    The fast growth of online communities and increasing pop-ularity of internet-accessing smart devices have significantly changed the way people consume and share music. As an emerging technology to facilitate effective music retrieval on the move, intelligent recommendation has been recently re-ceived great attentions in recent years. While a large amount of efforts have been invested in the field, the technology is still in its infancy. One of the major reasons for this stagna-tion is due to inability of the existing approaches to compre-hensively take multiple kinds of contextual information into account. In the paper, we present a novel recommender sys-tem called Just-for-Me to facilitate effective social music rec-ommendation by considering users ’ location related contexts as well as global music popularity trends. We also develop an unified recommendation model to integrate the contex-tual factors as well as music contents simultaneously. Fur-thermore, pseudo-observations are proposed to overcome the cold-start and sparsity problems. An extensive experimental study based on different test collections demonstrates that Just-for-Me system can significantly improve the recommen-dation performance at various geo-locations

    Empirical analysis of web-based user-object bipartite networks

    Get PDF
    Understanding the structure and evolution of web-based user-object networks is a significant task since they play a crucial role in e-commerce nowadays. This Letter reports the empirical analysis on two large-scale web sites, audioscrobbler.com and del.icio.us, where users are connected with music groups and bookmarks, respectively. The degree distributions and degree-degree correlations for both users and objects are reported. We propose a new index, named collaborative clustering coefficient, to quantify the clustering behavior based on the collaborative selection. Accordingly, the clustering properties and clustering-degree correlations are investigated. We report some novel phenomena well characterizing the selection mechanism of web users and outline the relevance of these phenomena to the information recommendation problem.Comment: 6 pages, 7 figures and 1 tabl

    Link prediction in complex networks: a local na\"{\i}ve Bayes model

    Get PDF
    Common-neighbor-based method is simple yet effective to predict missing links, which assume that two nodes are more likely to be connected if they have more common neighbors. In such method, each common neighbor of two nodes contributes equally to the connection likelihood. In this Letter, we argue that different common neighbors may play different roles and thus lead to different contributions, and propose a local na\"{\i}ve Bayes model accordingly. Extensive experiments were carried out on eight real networks. Compared with the common-neighbor-based methods, the present method can provide more accurate predictions. Finally, we gave a detailed case study on the US air transportation network.Comment: 6 pages, 2 figures, 2 table

    Predicting Missing Links via Local Information

    Get PDF
    Missing link prediction of networks is of both theoretical interest and practical significance in modern science. In this paper, we empirically investigate a simple framework of link prediction on the basis of node similarity. We compare nine well-known local similarity measures on six real networks. The results indicate that the simplest measure, namely common neighbors, has the best overall performance, and the Adamic-Adar index performs the second best. A new similarity measure, motivated by the resource allocation process taking place on networks, is proposed and shown to have higher prediction accuracy than common neighbors. It is found that many links are assigned same scores if only the information of the nearest neighbors is used. We therefore design another new measure exploited information of the next nearest neighbors, which can remarkably enhance the prediction accuracy.Comment: For International Workshop: "The Physics Approach To Risk: Agent-Based Models and Networks", http://intern.sg.ethz.ch/cost-p10

    Offering collaborative-like recommendations when data is sparse: The case of attraction-weighted information filtering

    No full text
    We propose a low-dimensional weighting scheme to map information filtering recommendations into more relevant, collaborative filtering-like recommendations. Similarly to content-based systems, the closest (most similar) items are recommended, but distances between items are weighted by attraction indexes representing existing customers ’ preferences. Hence, the most preferred items are closer to all the other points in the space, and consequently more likely to be recommended. The approach is especially suitable when data is sparse, since attraction weights need only be computed across items, rather than for all user-item pairs. A first study conducted with consumers within an online bookseller context, indicates that our approach has merits: recommendations made by our attraction-weighted information filtering recommender system significantly outperform pure information filtering recommendations, and favorably compare to data-hungry collaborative filtering systems
    corecore